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Abstract. We consider two-stage hierarchical area-level models based on natural ex¬ 
ponentially family for small area estimation. While the areas are treated exchangeably 
and the model parameters are assumed to be the same over the areas, we might lose the 
efficiency when there exist spatial heterogeneity. To overcome this problem, we here 
propose two-stage area-level model with spatially varying model parameters and lo¬ 
cal marginal likelihood approach to estimating these parameters to compute empirical 
Bayes estimators of area means. We also discuss some related problems including mean 
squared error estimation, benchmarked estimation and estimation in non-sampled ar¬ 
eas. The performance of the proposed method is evaluated through simulations and 
applications to two data sets. 
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1 Introduction 

Small area estimation is widely used to produce reliable estimates of area (cluster) 
means with small, or even zero sample sizes. In the case of a small sample size, it 
is well-recognized that the direct estimator based only on the area-specific samples 
has the high variability and is not appropriate for practical use. Hence, we need to 
“borrow strength” from related areas and produce indirect (model-based) estimates 
of area-specific means. For the purpose, mixed modeling or hierarchical modeling 
approach has been used as a standard statistical tool. For comprehensive reviews of 
small area estimation techniques, see Pferffermann (2013) and Rao and Molina (2015). 

In this paper, we focus on hierarchical area-level models for summarized area-level 
data. Let m be a number of areas, {yi, be the sampled data, where y* is 

the direct estimator of a area mean yj, satisfying E[yj|/rj] = pi, and xi is a vector 
of covariates associated with y*. Typically, y* is an unstable estimator of //j in the 
sense that Var(yj|yj) is large due to the small sample size within the area, thereby we 
aim to increase the accuracy of estimating y* by computing a model-based (indirect) 
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estimator. To this end, we consider the area-level hierarchical model proposed by 
Ghosh and Haiti (2004), described as 

f{yi\di) = ew{ni{0iyi - ipiOi)) + c(yi,ni)}, 

Tr{9i;cf)) = exp{i^{mi9i - il){9i)) + C{i',mi)}, 

where rrii = 'il;'{x\(3) with 9i is a natural parameter, rij is a known scalar 

value typically related to the area sample size, cj) = is a vector of unknown 

model parameters common in all the areas, and C'(-, •) are functions specific 

to each distribution. Under the model, the area mean /r* is expressed as 

yi = ^[yi\9i] = ip'idi), 


noting that E[/rj] = under ([T]). Moreover, it is assumed that the conditional variance 
is a quadratic function of the conditional mean /Xj, namely Var(?/j|0j) = 
where Q{x) = uq -|- vix + V 2 x‘^ for known constants vq, vi, and V 2 , which are not 
simultaneously zero. It is noted that Var(/rj) = Qirrii)/{v — V 2 ) under this assumption. 
Then the distributions of yi\9i and 9i in ([T]) with the form of the conditional variance 
belong to the natural exponential family with quadratic variance function studied by 
Morris (1982, 1983). The model ([T|) is flexible enough to use in practice since the 
model includes well-used area-level models as special cases, Fay-Herriot model (Fay 
and Herriot, 1979), Poisson-gamma model (Clayton and Kaldor, 1987) and binomial- 
beta model (Williams, 1975). Hence, the formulation ([1]) enables us to deal with typical 
three models in the same framework. 

It is remarked that the distribution of 9i in 0 is the conjugate prior. Then, 
the posterior distribution of 9i given yi has the same form as 7r(0; 0), and the Bayes 
estimator or the conditional expectation of yn can be obtained as 


yi = yiiyi,4>) 


Uiyi + urrii 
rii + u 


( 2 ) 


which is the weighted mean of the direct estimator yi and prior (synthetic) mean 
rrii with the weight determined by n* and v. Moreover, owing to the conjugacy, the 
marginal distribution of yi can be expressed as 


f{yi] 0) = exp {c{yi,ni) + C{v,mi) - C{ni + iy,yi)}, 


thereby the maximum likelihood estimator of (p can be defined as the maximizer of the 
function with an analytical expression, f [yy (p). By replacing the unknown 

parameter cp with its maximum likelihood estimator, we easily get the empirical Bayes 
estimator yi = yi{yi, cp) of yi. On the other hand, the generalized linear mixed models 
(Jiang, 2006) suffer from analytical tractability. Specifically, the posterior distribution 
of the area mean as well as the marginal distribution cannot be obtained in closed 
forms, so that we must rely on some computer intensive numerical method to com¬ 
pute the estimator of the area mean. Hence, the model dU can be seen as an useful 
alternative to the generalized linear mixed models in area-level data analysis. 

It should be pointed out that the model parameters in ([T]) are assumed to be the 
same over all the areas, which means that all the areas are treated exchangeably. How¬ 
ever, one might lose the efficiency of estimating the area mean through the non-spatial 
model © if there is spatial heterogeneity. In the context of generalized linear mixed 
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model, the simultaneous autoregressive structures for spatially correlated normal ran¬ 
dom effects are often used in both normal and non-normal models, for example, in 
Bandyopadhyay et al. (2009), Marhuenda et al. (2013), Wakefield (2007). On the 
other hand, in the context of the modeling spatial correlations among 0j’s is not 
straightforward. Instead, we assume that the model parameters 0 in ([1]) smoothly 
vary from areas to areas. Specifically, letting Ui is the location of fth area, we allow 
0 to depend on Ui, namely 0 = (j){ui). For estimating the spatially varying param¬ 
eter cl){ui), we apply the local likelihood method (Tibshirani and Hastie, 1987) to 
the marginal likelihood. Clearly, this method is closely related to the geographically 
weighted regression (Brunsdon et ah, 1998; Fotheringham et ah, 2002) in which the 
regression coefficients are assumed to be different in each location. In the context of 
small area estimation, the technique was used in Salvati et al. (2012) and Chandra et 
al. (2012) for continuous data, and Chambers et al (2014) for count data while the 
method is not based on hierarchical models but on quantile regressions. 

The rest of the paper is organized as follows. In Section [2l we propose spatially 
varying empirical Bayes methods and discuss some related problems including mean 
squared error estimation, benchmarked estimation and estimation in non-sampled ar¬ 
eas. In Section [3l we describe typical three models belonging to our framework. In Sec¬ 
tion m we evaluated the finite sample performances of the proposed methods through 
simulations. In Section [5l we show two results of applications to Scottish lip cancer 
data by Poisson-gamma model and Spanish poverty rate data by binomial-beta model. 
Finally, some discussions are given in Section [6j 

2 Spatially Varying Empirical Bayes Methods 

2.1 Spatially varying models and local likelihood estimation 

We introduce a spatially varying area-level models by allowing the model parameters 
in m to vary spatially, which are described as 

fiVilOi) = exjp{ni{eiyi - 0 ( 00 ) +c{yi,ni)}, 

7r{6i] (t){ui)) = exp [v{ui){mi{ui)Bi - ijj{6i)) -h C{n{ui),mi{ui))}, 

where mi{ui) = 0'(a;*/3(uj)), Ui = (uii,U 2 i) represents the coordinate of the fth area, 
and (t){ui) = {P{uiY,h'{ui)Y is the spatially varying model parameters. It is noted 
that the first stage model of yi\9i is the same as ([T]) while the prior distribution of 9i 
is spatially varying. 

To estimate the spatially varying parameters cj){ui), we use the local likelihood 
method (Tibshirani and Hastie, 1987) and suggest estimating by maximizing 

the following locally weighted log-likelihood function; 

m 

i{ct){ui)) = '^w{\\ui - Uk\\)^C{n{ui),mk{ui)) - C{nk + i^(u0,/ifc(j/fc, 0(«i)))|, (4) 

k=l 

where w{-) is a user-specified kernel function and JlkiUk, = {'^k+^{V'i))~^{nkyk + 

v{ui)mk{ui)) with mk{ui) = p' {x\p{ui)) . The weights w{\\ui — Uk\\) should gradually 
decrease as the distance between two locations Ui and Uk becomes larger. A common 
choice of the kernel w{-) is the Gaussian kernel (Brundson et ah, 1996) defined as 

w{\\ui - Uk\\) = exp 
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where b is the bandwidth controlling the rate at which the weight declines as the 
distance between two locations. The weight decays slowly under large b while the 
weight decays rapidly under small b. 

As mentioned earlier, the local likelihood depends on the unknown bandwidth b. 
Following Fotheringham et al. (2002), we select the bandwidth b based on the cross- 
validation criterion given by 


cv{b) = Y,{y^-yi-^){b)}\ ( 5 ) 

i=l 


where y(^_i-j{b) is the estimated value for location i, omitting the observation ?/j. Specif¬ 
ically, we use the synthetic estimator = ip'{x\l3, where is 

the estimated values from the local likelihood without y*. For searching b minimiz¬ 
ing CV(6), we use golden section search (Brent, et al, 1973) over the interval 
Here 6^ > 0 should set to be small, for example bi = 0.01, and we suggest setting 
bu = 2maxj_fc ll^j - Ufc|p. 

Under the model ([3]), the Bayes estimator of /i* = E[yi|0j] is the same form as ([2]) 
and the empirical Bayes estimator is 




njUi + u{ui)fhi{ui) 

rii + d{ui) 


which we call spatially varying empirical Bayes (SVEB) estimator. 


2.2 Hybrid bootstrap mean squared error estimation 

In real applications, measuring uncertainty of the empirical Bayes estimator is required 
to asses the reliability of the estimates. Traditionally, an estimator of mean squared 
errors have been used for the purpose, see Prasad and Rao (1990) and Datta et al. 
(2005). 

The MSE of the empirical Bayes estimator can be expressed as 
MSEi = E [{-fii - yiif] = E [{yi - paf] + E [{yi - Jaf] 

= Rii{(f){ui)) + R2i{(f){ui)), 


since fii = E[fii\yi]. Owing to the quadratic variance function, it follows that 


Rii{4>{ui)) 


u{ui)Q{mi{ui)) 

{Ui + v{Ui)){u{Ui) - U2)‘ 


Concerning the second term R 2 i{cl){ui)), it vanishes as m —>• oo, thereby a naive (prim¬ 
itive) estimator of MSE is 

= RiSiui)). (6) 

However, if m is not sufficiently large, i?2i is not necessarily negligible, and the naive es¬ 
timator could underestimate the true MSE. Moreover, the plug-in estimator Rii{(f){ui)) 
is known to have a considerable bias. 

To estimate the MSE more accurately than the naive MSE estimator Q, we use 
the hybrid bootstrap approach used in Butar and Lahiri (2003). Let {yf,... ,2/^} be 
the parametric bootstrap samples generated from the model ([3]) with 
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and define 4> {ui) as the estimator computed from the bootstrap samples. Then the 
hybrid bootstrap MSE estimator is given by 

MSEi = 2Rii{4>{ui)) - ^ X] + ;^ X] 4>\ui)) - 0(iij))| . 

^ S = 1 ^ S=1 

(7) 

Note that the last term corresponds to the estimator of R 2 i, and first two terms 
correspond to a bias-corrected estimator of Ru. Here we used an additive form for 
bias correction in estimating Ru, other several forms have been proposed, see Hall and 
Maiti (2006). 


2.3 Benchmarked estimation 

The potential difficulty of an empirical Bayes estimator is that a (weighted) sum of 
empirical Bayes estimates is not necessarily equal to the corresponding direct estimates. 
Moreover, empirical Bayes estimates sometimes produce over-shrunk estimates, which 
results in inaccurate estimates of small area means. To avoid these problems, the 
benchmarked estimator (Datta et ah, 2011; Bell et al, 2013) has ben used as a standard 
tool in small area estimation. Here we consider the constraint YliLi CiUi 

with some known weight Ci satisfying = 1- A typical choice is q = ni/ 

Erom Datta et al. (2011), the constrained empirical Bayes estimator Jlf that minimizes 
the squared error ^ E[(/ip — /r*)^] has the form 

m 

h? = fli + u;i'^Ck ivk - hk ), ( 8 ) 

k=l 

with uoi = Cil The weight q often satisfies maxi<j<mQ = like 

Ci = ni/ Then, the difference between flf and /r, decreasing as the number 

of areas m gets large, namely, the differences are negligible when m is sufficiently large. 

Since the benchmarked estimator increases the mean squared errors compared to 
the empirical Bayes estimator, we need to assess how large the excess MSE is. Regard¬ 
ing this issue, Steorts and Ghosh (2013) and Kubokawa et al. (2014) investigated the 
MSE estimators of benchmarked empirical Bayes estimators in area-level models via 
analytical or numerical way. Here, Similarly to Kubokawa et al. (2014), we consider a 
bootstrap method for evaluating the excess MSE. The excess MSE is expressed as 

EMSEi = E [(/2f - fii)^] - E [iJii - yi)^] 

= E [(/if - yif] + 2E [(/If - /Ii)(/Ii - Jli)] . 


Therefore, the similar parametric bootstrap procedure used in the previous section 
enables us to estimate the excess MSE: 


EM^i 



S = 1 S = 1 


(9) 


where /If = /ii(y|,(f) (ui)) and /if’^ is the benchmarked estimator ([8]) by replacing yi 
and /Ij with yf and /If, respectively. 
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2-4 Estimation in non-sampled area 


In real applications, there could exist small areas with zero sample sizes. Let j be the 
index of a non-sampled area, and it is assumed that the covariate Xj is available. In 
the traditional hierarchical model ([T]), the reasonable estimator of pj is 

Pj{(3) = mj = 

which is obtained by putting n* = 0 in the Bayes estimator ([2]). Similarly, we can 
define the estimator of pj under the spatially varying model Q as 

= mj{uj) = il)'{x](3{uj)), ( 10 ) 

which is expected to provide more accurate estimates of pj than the traditional one by 
using local information of non-sampled areas. The estimator of P{uj) can be obtained 
by the local weighted likelihood ([3]) without the information in area j: 

m 

^ w{\\uj - Uk\\)^C{i'{uj),mkiuj)) - C{nk + n{uj),pk{yk,(l>{uj)))y 
k=l,k^j 

Thus the empirical version of (llOh is given by mj{uj) = ip' {x4'-(3{uj)). The MSE of 
rhjiuj) can be expressed as 

MSEj = E [{fhj^Uj) — Pj)"^] = E [{mj{uj) — mj -\- mj — Pj)"^] 

= + E Ufhj{uj) - nij)"^] -\- 2E [{fhj{uj) - mj){mj - pj )], 

U — V2 

which can be estimated by the similar parametric bootstrap method given in Section 

ESI 


3 Typical Models 


3.1 Fay-Harriot model 


When we assume that distributions of ypOi and 9i are both normal with n* = D~^, v = 
A~^, ip{9i) = 9f/2, vi = V 2 = 0 and vq = 1, the model Q corresponds to the Eay- 
Herriot model (Fay and Herriot, 1979) with spatially varying parameters, described 
as 

Pi = x\P{ui) + ^/A{ui)vi +-s/dJiSi, z = l,...,m, 

where Uj’s and Ej’s are mutually independent standard normal random variables, and 
DiS are known sampling variances. Under the model, the marginal distribution of pi 
is also normal, N{x\(3{ui).,A{ui) Di), thereby the Fisher-scoring algorithm is easily 
implemented for maximizing local likelihood ([1]). The algorithm entails updating the 
current estimates /3) ' and d) ' as 


m f 


-1 


gC+i) = gC) + W 

■ ■ IfeT'+o., 


E 


WikiVk - xlpP)xk 


k=l 
-1 , 


+ D, 


th+i) _ /tW 


= 4 ^^- E' 


Wik 


=1 ^ + ^kY ) k=l \ (A ^ + ^kY A ^ + ^k 


E 


Wik 


ivk - xlp^pY 


(r) 


ih) 


where Wik = tc(||ui — Uk\\). This step is repeated until numerical convergence to get 
the estimates (3^ = P{ui) and Ai = A{ui). 


6 









3.2 Poisson-gamma model 


When the distributions of Zi{= niyi)\Xi and Aj = exp(0j) are assumed to be Poisson 
and gamma, respectively, with 'ip{6i) = exp(0j), vq = V 2 = 0 and ui = 1, the model Q 
is expressed as 


ZilAj ~ Po(niAi) Xi r^r{i^{ui)mi{ui),u{ui)), i = 


where Ai,...,Am are mutually independent, Po(A) denotes the Poisson distribution 
with mean A, and r(a, 6) denotes the gamma distribution with density 


fix) 


ba 


^exp(-te). 


X > 0. 


The model corresponds to the Poisson-gamma model proposed in Clayton and Kaldor 
(1987) with spatially varying model parameters. It is well-known that the marginal 
distribution of Zi is the negative binomial distribution with probability function 




T{zi + v{ui)mi{ui)) / m A / Vj 
T{zi + l)T{v{ui)mi{ui)) \ni + uiui)) \ni + i>{ui) 


u(ui)mi(ui) 


and the local likelihood ([3]) is similar to the likelihood of geographical weighted negative 
binomial regression model suggested in Silva and Rodrigues (2014). For maximizing 
the weighted likelihood ([3]), we might use the iteratively reweighted least squares pro¬ 
cedure proposed in Silva and Rodrigues (2014). On the other hand, owing to the 
conjugacy of the prior distribution of Aj, the Expectation-Maximization (EM) algo¬ 
rithm (Dempster et al, 1977) is also an attractive tool. With the current estimates 
and y‘f\ the algorithm entails computing E[logtti] with Ui ~ T{niyi + 
and rn'p = exp(a;(/3^^^) as the E-step, and updating the current values by maximizing 
the weighted gamma likelihood with respect to {3 and zz: 


m 

E 

2=1 


w{\\ui - UkW) < urrii logiz - logr(izmi) -k izmjE[log tt*] - v 


I (^) (^) 

riiyi -k vl 'ml 


rii + K 


(r) 


3.3 Binomial-beta model 


When the distributions of Zj(= niyi)\pi andp* = logistic(0j) with logistic(x) = exp(x)/(l-k 
exp(x)) are assumed to be binomial and beta, respectively, with '0(^0 = log(l + 
exp(0j)), xo = 0, xi = 1 and V 2 = —1, the model ([3]) is expressed as 

Zi\pi ~ Bin(ni,pi) pi ~ Beia.[v{ui)mi{ui),v{ui)(l - mi{ui))), i = 1,... ,m, 
where Beta(o, b) denotes the beta distribution with density 

/(x) = B{a, 6)“^x““^(l — x)^“^, 0 < X < 1, 


and B{a,b) is the beta function. This model can be regarded as the extension of 
the binomial-beta model used in Wiliams (1975) in terms of the spatially varying 
hyperparameters. Under the model, the marginal probability function of Zi can be 
obtained as 


/m(-2i, 0('U.i)) — 


A B{zi -k v{ui)mi{ui),ni - Zj + zz('Ui)(l - mi{ui))) 

J B{v{ui)mi{ui),v{ui){l - mi{ui))) 


m 

.^i 
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which is not a familiar form. For maximizing the weighted likelihood ([3]), we can use the 
EM algorithm owing to the conjugacy of the prior of pi. Based on the current estimates 
(3!f^ and we first compute the two expectations E[logtij] and E[log(l — Ui)] with 
Ui ~ 'Beta.{zi + rn[\ni — Zi + and rn'p = and then 

update the current estimates by maximizing the weighted beta likelihood with respect 
to (3 and v. 

m 

'^w{\\ui - Ufcll) {i/mjE[logUi] + Z/(1 - mj)E[log(l - Ui)] - \ogB{vmi,v{l - m,))} . 

i=l 

4 Simulation studies 

^.1 Estimation error comparison in sampled area 

We first investigate estimation errors of the proposed estimator with the traditional 
estimator in finite samples. We first considered the Poisson-gamma model described in 
Section [3l The coordinates Ui = {uii,U 2 i) were generated from the uniform distribution 
on (0,1) X (0,1), and covariate Xi was generated from the uniform distribution on 
(—1,1). Then we generated the simulated data from the following Poisson-gamma 
model: 

Zjl/Zj ~ Po(ni^j), Pi z = l,...,m, (11) 

where m = 60, Ui = 20, Ui = {uii,U 2 i) and mi{ui) = exp(/3o(uj) -|- f5i{ui)xi). For 
setting hyperparameters, we considered two scenarios: (I) spatially varying: (5Q{ui) = 

uii - U 2 i - 1, I3i{ui) = uf - + ul- and iz{ui) = m exp(ttii -k U 2 i - 1), and (II) spatially 
constant: /3o(uj) = 0.1, /3i(uj) = 0.7 and n{ui) = 50. It should be noted that the 
non-spatial Poisson-gamma model is the true model in setting (II) and the proposed 
spatially varying method is over-specified in this case. We applied both patially varying 
(SV) method and spatially constant (SC) method assuming /3o(uj) = /3o,/3i(ui) = /3i 
and ^{ui) = to the simulated data and computed the empirical Bayes estimates pi 
of Pi- Based on ii = 1000 simulation runs, we simulated the area-level MSE defined as 

R 

r=l 

where p\ and /i) ^ denotes the estimated and true values of pi in the rth simulation 
run. To compare the results between SV and SC, we computed the percent relative 
difference of the root of MSE: 

MSEfV _ ./MSEfC 

- ^ ^ X 100, (13) 

^MSEfc 

where MSE®^ and MSE®*^ are the simulated MSE values of SV and SC, respectively. In 
the upper panel of Figure [H we present the scatter plots of RDj in two scenarios, noting 
that the negative value of RDj means the proposed SV method provides more accurate 
estimate than the SC method. From Figure [H it is revealed that SV considerably 
improved the estimation accuracy over SC under scenario (I). On the other hand, it is 
natural result that SV is inefficient compared to SC under scenario (II) since SV uses 
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only local information for estimating hyperparameters. However, it should be pointed 
out that the difference between SV and SC is quite small in scenario (II) compared to 
the amount of improvement in scenario (I). 

We next considered the Binomial-beta model: 


Zil/Lii ~ Bin(ni,//j), m Beta{u{ui)mi{ui),u{ui){l - mi{ui))), i = (14) 

where m = 60, n, = 20 and mi{ui) = logistic(/3o(rii) -|- I3i{ui)xi). We again considered 
the same two scenarios for hyperparameters. Based on i? = 1000 simulation runs, 
we computed the simulated MSE (I12p . and the percent relative differences (1131) in two 
scenarios are presented in the lower panels of Figure [TJ It is revealed that the proposed 
SV method works well in the binomial-beta model. 

4-2 Estimation error comparison in non-sampled area 

We next investigated the estimation errors in non-sampled areas as discussed in Sec¬ 
tion Ea To this end, we considered 80 small areas in which the fist m = 60 areas 
are sampled areas and the last A; = 20 areas are non-sampled areas. Similarly to the 
previous section, we set n* = 20 for all sampled areas, and used the same data gen¬ 
erating processes for the coordinates ui = {uii,U 2 i) and covariate Xi. Concerning the 
hyperparameters, we again considered the same two scenarios as used in Section 14.11 
We first considered the Poisson-gamma model. The true means pn were generated from 
the following model: 

Pi ~ r(z^(iii)exp(/3o(wj) + I3i{ui)xi),v{ui)), i = l,...,m + k. 

On the other hand, the observations were generated only in the hrst m areas: 

Zi\pi r~^Y’o{niPi), i = l,...,m. 

Based on the observations zi,..., Zmi we computed the estimates of Pm+i, ■ ■ ■, Pm+k 
in non-sampled area, based on both spatially varying (SV) and spatially constant (SC) 
methods given in section 12.41 Based on ii = 1000 iterations, we simulated the area- 
level MSE dehned in (112p in non-sampled area i = m-|-l,...,m-|-A:. To compare the 
results, we calculated the percentage relative difference (fOp and present in Figure [2] 
in two scenarios. Moreover, we also considered the binomial-beta model: 

Zi\pi r^Bui{ni,pi), i = l,...,m. 

Pi ~ Beta{i'{ui)mi{ui),i'{ui){l - 771 ^( 2 /^))), i = 1,... ,m-\-k, 

with mi{ui) = logistic(/3o(wi) -|- /3i(iii)xj), and computed the percentage relative dif¬ 
ference (113p between SV and SC methods in the same two scenarios, which are shown 
in Figure El From Figure El we can observe the similar results to the previous section, 
namely, SV works much better than SC in scenario (I) and SV produces inefficient 
estimates compared with SC in scenario (II) while the differences are quite small. 

4-3 Finite sample performances of MSE estimators 

We hnally investigate the hnite sample performance of MSE estimator developed in 
Section E21 Similarly to the previous studies, we considered both Poisson-gamma and 
binomial-beta models. We assume there are m = 50 areas divided into five groups with 
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Figure 1: Ratios of the MSE (upper) and the bias (lower) of the predictors in the spatial 
and non-spatial binomial-beta model under a spatial non-stationary population (left) 
and a stationary population (right). 
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Figure 2: The percentage relative differences between the spatially varying method 
and spatially constant method in two scenarios in Poisson-gamma and binomial-beta 
models. 
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equal number of areas. The area-specific constant Uj are the same for areas within the 
same group. We used (10,15, 20, 25, 30) for the group patterns of n*. 

The coordinate Ui = {uii,Ui 2 ) were generated from the uniform distribution on 
(0,1) X (0,1) and covariate Xi from the uniform distribution on (—1,1). In both 
Poisson-gamma m and binomial-beta (|14ll models, we set the true hyperparameters 

as (3o{ui) = uii - U 2 i - 1, /3i{ui) = ^{ui) = 30exp(riij -h U 2 i - 1). 

We first simulated the MSE m based on R = 100 simulation runs, which are used 
as the true values of MSE in each area. For estimating these true MSEs, we used 
the bias-corrected MSE estimator given in d?!) with B = 100 bootstrap samples. For 
comparison, we also considered the naive estimator ([6]) which has considerable bias 
when m is not sufficiently large. Based on 5 = 100 iterations, we calculated the 
percentage relative bias (RB) and coefficient of variation (CV), which are defined as 


1 MSe!'^ - MSE,; 
RBi = ^ 2^ - 

S=1 


MSE,' 


CV,; = 


1 A /M^f^ -MSE,;^ 2 




MSE,' 


where MSEj is the MSE estimate in the sth iteration and MSEj is the true value. In 
Table [H we report the averaged values of RB and CV within the same groups. It is 
noted that RB and CV of the naive MSE estimator are represented by RBN and CVN, 
respectively. From Tabled! it is revealed that the naive MSE estimator has the serious 
negative bias even when m = 50, which comes from the characteristics that the naive 
MSE estimator ignores the variability in estimating unknown hyperparameters. On the 
other hand, the bias-corrected MSE estimator provides relatively accurate estimates 
in terms of both RB and CV owing to the bootstrap bias correction. 


Table 1: Percentage relative bias (RB) and coefficient of variation (CV) of the 
bias-corrected MSE estimator and the naive MSE estimator in Poisson-gamma and 
binomial-beta model. The RB and CV for the naive estimator are denoted by RBN 
and CVN, respectively. 



m 

10 

15 

20 

25 

30 

Poisson-gamma 

RB 

-6.30 

16.37 

-3.44 

7.58 

1.68 


CV 

43.87 

45.26 

40.41 

36.49 

32.14 


RBN 

-28.51 

-10.89 

-25.00 

-13.58 

-18.39 


CVN 

49.97 

45.23 

48.28 

43.94 

38.99 

Binomial-beta 

RB 

-0.84 

-5.03 

-6.24 

-0.55 

2.05 


CV 

47.37 

39.93 

38.41 

36.45 

32.35 


RBN 

-27.66 

-33.19 

-29.70 

-28.18 

-23.03 


CVN 

48.27 

47.53 

46.14 

43.71 

37.67 


5 Examples 

5.1 Scottish lip cancer data 

We first apply the proposed method to Scottish lip cancer data during the 6 years 
from 1975 to 1980 in each of the m = 56 counties of Scotland, which was also analyzed 
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in Clayton and Kaldor (1987). For each county, the observed and expected number 
of cases are available, which are respectively denoted by Zi and n*. Moreover, the 
proportion of the population employed in agriculture, fishing, or forestry is available 
for each county, thereby we used it as a covariate AFFj, following Wakefield (2007). 
For each area, i = 1,..., m, we applied the spatially varying Poisson-gamma model: 

ZilAi ~ Po(njAi), \i r^T{v{ui)exp{l3i{ui) + l32{ui)KF¥i),v{ui)), (15) 

where Ui = {uii,Ui 2 ), and un and Ui 2 are standardized longitude and latitude. 

We first searched the optimal bandwidth by minimizing the criteria Q and got b* = 
0.900. Then we computed the estimates of hyperparameters as well as the spatially 
varying empirical Bayes (SVEB) estimates of A* with 6 = 6*, which are shown in Figure 
[3l From Figure [3l it is observed that the hyperparameter estimates are dramatically 
changing from areas to areas. For comparison, we applied the conventional Poisson- 
gamma model: 

ZilAi ~ Po(niAi), Aj ~ F(i/exp(/3i-h; 82 AFFi),i/), (16) 

and the maximum likelihood estimates of the hyperparameters are D = 2.13,/3i = 
—0.15 and (32 = 5.18. 

Let A®^^® and Af® be the spatial varying empirical Bayes (SVEB) estimates from 
(jl5p and the empirical Bayes (EB) estimates from (1161) . respectively. In the left panel 
of Figure m we show the sample plot of the percentage relative difference 100 x (Af® — 
against the log expected number of cases logrij. We can observe that 
the differences are larger in areas with small n* while the difference gets smaller as 
Hi gets larger. This is because both SVEB and EB estimators are close to the direct 
estimator y* in areas with large n*. In the right panel of Figure 01 we present the 
sample plot of the square root of MSE (RMSE) estimates based on 500 bootstrap 
samples against logn*, which reveals that the RMSE decreases as n* increases. 

Finally, we computed the benchmarked estimator Xf from (jH]) with the weight 
Ci = relative differences to Aj = A®^^^ are presented in the left 

panel of Figure [5l The figure shows that the differences are increasing with respect 
to Ui, which comes from the choice of the benchmarking weight c*. However, in most 
areas, the relative differences are smaller than 2%, so that Ap and A* is quite similar. 
Based on 500 bootstrap replications, we calculated the excess MSE estimates by using 
(lH) and computed the ratio to the MSE estimates of SVEB. The histogram of the ratio 
is given in the right panel of Figure 01 which shows that the percentage of the risk 
inflation is at most 1.4%. 

5.2 Spanish Poverty rates data 

We next use synthetic income data set in Spanish provinces, which is available in the R 
package sae. In the data set, unit data are available for 52 areas. Let W ,i = 1,... ,m 
denote the population sizes of the areas. Let Eij be the equivalized disposable income 
calculated following the standard procedure of the Spanish Statistical Institute, and z 
be the poverty line. The poverty rate for area i is defined as pi = N~^ < ^)- 

Unfortunately, we do not observe all Eij's but observe only Eij,j = 1,..., rij. A direct 
estimator y* of pi is given by 

^ ni 
2=1 
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Figure 4: Sample plots of the percentage relative difference between SVEB and EB 
estimates: 100 x (Af^ — (left) and the ratio of the squared root of MSE 

(RMSE) estimates of SVEB to those of EB (right) in Scottish lip cancer data. 
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Figure 5: Sample plots of the percentage relative difference between SVEB and the 
benchmarked estimator: 100 x (Ap — Aj)/Aj (left) and histogram of the excess MSE 
estimates (right) in Scottish lip cancer data. 
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Figure 6: Sample plots of the percentage relative difference between SVEB and EB 
estimates: 100 x (left) and the ratio of the squared root of MSE 

(RMSE) estimates of SVEB to those of EB (right) in Spanish poverty rate data. 


where we set z as 0.6 times median of all the observed income Ejj’s, following Molina 
and Rao (2010). As area-level covariates, we used area-level rates of female and labors, 
which are respectably denoted by fe* and lab*. Since two provinces, PalmasLas and 
Tenerife, are very far away from the other provinces, we have omitted in this study. 
Then, we applied the following binomial-beta model for i = 1,..., m: 

UilPi ~ Pi ~ Beta{i^{ui)mi{ui), i^{ui){l - mi{ui))), (17) 

where mi{ui) = logistic(/3i(rii) /32(ui)fei + I32,{ui)\ahi) and Ui = (uii,rii 2 ), and un 
and Ui2 are standardized longitude and latitude. For comparison, we also applied the 
conventional binomial-beta model: 

~ Bin(ni,pi), p* ~ Beta(i/mi, z^(l - m*)), (18) 

with rrii = logistic(/3i -|- /32fei -|- /^alabj). 

We found that the optimal bandwidth is b* = 2.42, and some empirical quintiles of 
the hyperparameter estimates are provided in Table [2j From Table [21 we can observe 
that the medians of spatially varying hyperparameter estimates in spatially varying 
model (HZl) is close to the point estimates in the conventional model (IlSp . In the 
left panel in Figure [H we presented the percentage relative difference 100 x (p™ — 
^SVEB^y/I^SVEB^ where and are empirical Bayes estimates from (I17p and 

(USD, respectively. We can observe that the differences are smaller than 6% in all the 
areas except for the one area, and the differences vanishes as the area sample size n* 
gets large. Based on 500 bootstrap replications, we computed the MSE estimates of 
p^VEB^ and the square root of MSE (RMSE) estimates are given in the right panel of 
Figure [6l which shows the natural result that the MSE decreases with respect to n*. 
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We finally computed the benchmarked estimator of pi in model (1171) . and we found 
that the percentage relative difference between the SVEB and benchmarked estimates 
are smaller than 0.15% and the excess risks in benchmarking based on 500 bootstrap 
samples were negligibly small. 


Table 2: Quantiles of hyperparameter estimates in spatially varying (SV) model and 
point estimates in spatially constant (SC) model in Spanish poverty rate data. 



0 % 

25% 

SV 

50% 

75% 

100 % 

SC 

Estimate 

/3i 

-7.96 

-3.34 

-2.28 

-1.03 

1.11 

-2.70 

/32 

-3.47 

2.13 

3.50 

5.26 

10.65 

3.85 

A 

-4.54 

-2.16 

-1.69 

-0.90 

3.22 

-1.19 

u 

42.33 

44.12 

48.06 

51.59 

103.06 

46.32 


6 Discussion 

We have developed spatially varying empirical Bayes methods based on the local likeli¬ 
hood estimation in which the optimal bandwidth in a kernel function is determined by 
cross validation. The model we considered can be regarded as a generalization of the 
two-stage hierarchical area-level models based on natural exponential family, proposed 
by Ghosh and Maiti (2004). The model includes Fay-Herriot model, Poisson-gamma 
model, and binomial-beta model as special cases, so that the model is applicable for 
both continuous, count and binary data. We have considered some problems including 
the MSE estimation, benchmarking estimation, estimating in non-sampled areas. The 
proposed methods were compared with the conventional non-spatial models through 
simulation and empirical studies, and we have found that the proposed method works 
well and would improve the estimation accuracy of the traditional methods. 

The possible drawback of the proposed method is its computational costs when the 
number of areas m is large. For specified bandwidth b, it requires m times maximization 
of the weighted log-marginal likelihood dH to compute hyperparameter estimates in 
each area, thereby its computational cost increases linearly depending on m. A possible 
solution is to assume that m areas can be classified in G groups, where G is much 
smaller than m, and the hyperparameters are the same in all the areas within the 
same group. This can reduces the number of maximization from m to G for each b. 
However, it is not straightforward how we can efficiently divide the areas. Hence, the 
detailed consideration about the issue seems to exceed the scope of this paper and we 
left the problem as a valuable future study. 
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